37 research outputs found

    BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues

    Full text link
    Abstract Background Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power. Results Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation. Conclusions Our findings support the use of BoostMe as a preprocessing step for WGBS analysis.https://deepblue.lib.umich.edu/bitstream/2027.42/143848/1/12864_2018_Article_4766.pd

    Genetic regulatory signatures underlying islet gene expression and type 2 diabetes

    Get PDF
    The majority of genetic variants associated with type 2 diabetes (T2D) are located outside of genes in noncoding regions that may regulate gene expression in disease-relevant tissues, like pancreatic islets. Here, we present the largest integrated analysis to date of high-resolution, high-throughput human islet molecular profiling data to characterize the genome (DNA), epigenome (DNA packaging), and transcriptome (gene expression). We find that T2D genetic variants are enriched in regions of the genome where transcription Regulatory Factor X (RFX) is predicted to bind in an islet-specific manner. Genetic variants that increase T2D risk are predicted to disrupt RFX binding, providing a molecular mechanism to explain how the genome can influence the epigenome, modulating gene expression and ultimately T2D risk

    Interactions between genetic variation and cellular environment in skeletal muscle gene expression

    Get PDF
    From whole organisms to individual cells, responses to environmental conditions are influenced by genetic makeup, where the effect of genetic variation on a trait depends on the environmental context. RNA-sequencing quantifies gene expression as a molecular trait, and is capable of capturing both genetic and environmental effects. In this study, we explore opportunities of using allele-specific expression (ASE) to discover cis-acting genotype-environment interactions (GxE)-genetic effects on gene expression that depend on an environmental condition. Treating 17 common, clinical traits as approximations of the cellular environment of 267 skeletal muscle biopsies, we identify 10 candidate environmental response expression quantitative trait loci (reQTLs) across 6 traits (12 unique gene-environment trait pairs; 10% FDR per trait) including sex, systolic blood pressure, and low-density lipoprotein cholesterol. Although using ASE is in principle a promising approach to detect GxE effects, replication of such signals can be challenging as validation requires harmonization of environmental traits across cohorts and a sufficient sampling of heterozygotes for a transcribed SNP. Comprehensive discovery and replication will require large human transcriptome datasets, or the integration of multiple transcribed SNPs, coupled with standardized clinical phenotyping.Peer reviewe

    Multiomic Profiling Identifies cis-Regulatory Networks Underlying Human Pancreatic β Cell Identity and Function.

    Get PDF
    EndoC-βH1 is emerging as a critical human β cell model to study the genetic and environmental etiologies of β cell (dys)function and diabetes. Comprehensive knowledge of its molecular landscape is lacking, yet required, for effective use of this model. Here, we report chromosomal (spectral karyotyping), genetic (genotyping), epigenomic (ChIP-seq and ATAC-seq), chromatin interaction (Hi-C and Pol2 ChIA-PET), and transcriptomic (RNA-seq and miRNA-seq) maps of EndoC-βH1. Analyses of these maps define known (e.g., PDX1 and ISL1) and putative (e.g., PCSK1 and mir-375) β cell-specific transcriptional cis-regulatory networks and identify allelic effects on cis-regulatory element use. Importantly, comparison with maps generated in primary human islets and/or β cells indicates preservation of chromatin looping but also highlights chromosomal aberrations and fetal genomic signatures in EndoC-βH1. Together, these maps, and a web application we created for their exploration, provide important tools for the design of experiments to probe and manipulate the genetic programs governing β cell identity and (dys)function in diabetes

    Genetic variant effects on gene expression in human pancreatic islets and their implications for T2D

    Get PDF
    Most signals detected by genome-wide association studies map to non-coding sequence and their tissue-specific effects influence transcriptional regulation. However, key tissues and cell-types required for functional inference are absent from large-scale resources. Here we explore the relationship between genetic variants influencing predisposition to type 2 diabetes (T2D) and related glycemic traits, and human pancreatic islet transcription using data from 420 donors. We find: (a) 7741 cis-eQTLs in islets with a replication rate across 44 GTEx tissues between 40% and 73%; (b) marked overlap between islet cis-eQTL signals and active regulatory sequences in islets, with reduced eQTL effect size observed in the stretch enhancers most strongly implicated in GWAS signal location; (c) enrichment of islet cis-eQTL signals with T2D risk variants identified in genome-wide association studies; and (d) colocalization between 47 islet cis-eQTLs and variants influencing T2D or glycemic traits, including DGKB and TCF7L2. Our findings illustrate the advantages of performing functional and regulatory studies in disease relevant tissues.Peer reviewe

    The trans-ancestral genomic architecture of glycemic traits

    Get PDF
    Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 x 10(-8)), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution. A trans-ancestry meta-analysis of GWAS of glycemic traits in up to 281,416 individuals identifies 99 novel loci, of which one quarter was found due to the multi-ancestry approach, which also improves fine-mapping of credible variant sets

    The trans-ancestral genomic architecture of glycemic traits

    Get PDF
    Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 x 10(-8)), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution. A trans-ancestry meta-analysis of GWAS of glycemic traits in up to 281,416 individuals identifies 99 novel loci, of which one quarter was found due to the multi-ancestry approach, which also improves fine-mapping of credible variant sets.Peer reviewe

    Understanding the Genetics of Gene Regulation Using Multi-Omics Profiling

    Full text link
    Type 2 diabetes (T2D) is a complex disease that affects an estimated 415 million people worldwide. Genome wide association studies (GWAS) have identified >240 genetic signals that encode predisposition to this disease and related traits. However, the underlying biological mechanisms driving this predisposition are largely unknown, which is a serious impediment in designing precision therapeutic strategies. The focus of my research is to untangle the genetic complexity of T2D to better understand the biological mechanisms of how disease predisposition is encoded in our DNA. Specifically, I aim to understand how T2D genetic risk variants modulate gene expression in orchestrating disease mechanisms. I utilize high throughput molecular profiling data in human pancreatic islets and other diverse tissues along with human and rodent cell line model systems and employ computational and experimental approaches to map functional signatures of genetic variants associated with T2D. First, I compared gene regulatory annotations defined using diverse epigenomic data across 4 cell types to compare their cell specificities and genetics of gene expression regulation. I observed that genetic variants in genomic regions with more cell type-specific enhancer chromatin have lower effects on gene expression than variants in genomic regions with more ubiquitous promoter chromatin. However, genetic variants in cell type-specific enhancer regions have higher effects in chromatin accessibility than those in less cell type-specific promoter regions. Second, I integrated GWAS data with various -omics data in islets to nominate biological mechanisms. I observed that T2D risk variants confluently disrupt DNA binding motifs of the transcription factor (TF) regulatory factor X (RFX) in accessible regions. Third, I describe large scale expression quantitative trait locus (eQTL) mapping efforts along with integration of epigenomic data to describe molecular regulatory mechanisms. Utilizing such large eQTL and integrating information such as chromatin accessibility and TF binding predictions helped elucidate in vivo TF activity preferences. Fourth, I describe profiling and analysis of the enhancer transcriptome in islets, which I then integrate with other available epigenomic data to better understand the characteristics of gene regulation.PHDHuman GeneticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/151691/1/arushiv_1.pd

    Super Enhancers in Cancers, Complex Disease, and Developmental Disorders

    No full text
    Recently, unique areas of transcriptional regulation termed super-enhancers have been identified and implicated in human disease. Defined by their magnitude of size, transcription factor density, and binding of transcriptional machinery, super-enhancers have been associated with genes driving cell differentiation. While their functions are not completely understood, it is clear that these regions driving high-level transcription are susceptible to perturbation, and trait-associated single nucleotide polymorphisms (SNPs) occur within super-enhancers of disease-relevant cell types. Here we review evidence for super-enhancer involvement in cancers, complex diseases, and developmental disorders and discuss interactions between super-enhancers and cofactors/chromatin regulators
    corecore